Getting the data

Delivery data

The actual historical delivery data of each Crop Protection (CP) brand and SKU of each country was extracted with the following SQL query from the Redshift database. This example gives data of India.

select
    distinct del.<delivery_id>,
    mp.plant_id,
    mm.material_id as SKU,
    mm.<brand name> as brand,
    del.delivery_date,
    mpld.product_line,
    del.<quantity delivered>
from
    <delivery table> del
    join <material table> mm on del.material_id = mm.material_id
    join <plant table> mp on del.plant_id = mp.plant_id
    join <product line table> mpld on mm.uc_productlineowner = mpld.product_line
    join <billing table> mbf on del.delivery_number = mbf.delivery_number
    join <material type table> mma_geb on mma_geb.material_guid = mm.material_guid
where
    del.source_system_cde = <SAP>
    and mp.country = 'IN' -- India - Change here to include/remove any country
    and not(<plant> IS <obsolete>)
    and del.<delivery type> = <GSAP>
    and mm.record_type = 'CP' -- Crop Protection products only
    AND del.delivery_date > '2016/01/01' -- historical data from 2016 is extracted
    AND mpld.<product line> IN <relevant product lines>
    and mbf.<billing document number> is not null 
    and del.<quantity delivered> > 0
group by
    1,
    2,
    3,
    4,
    5,
    6,
    7
order by
    5

This query would be executed on the 1st day of every month to get the latest delivery data for each country.

Weather data

Weather is an important factor affecting agriculture. Weather data was updated on the first of every month.

Since fetching/updating the weather data was done though company specific tools, it cannot be documented here.

Economic Indicators data

World Bank indicators of the country can also prove to be useful for forecasting. This was fetched with the wb api:

all_indicator_ids = [s['id'] for s in wb.series.list()]
df =  wb.data.DataFrame(all_indicator_ids, country_code, time=time_range, labels=True)
df['Series'] = df['Series'].apply(lambda x: re.sub('[^A-Za-z0-9_]+', ' ', x))
df.dropna(how='any', inplace = True)
df = df.set_index('Series').T.reset_index(names = 'year')
df['year'] = df['year'].str.replace('YR', '')
df['year'] = df['year'].astype('int32')
df['year'] = df['year'] + 1
df.to_parquet(f'data/{country}/economic_indicators.parquet', index=False)

This would also be executed on the 1st day of every month to get the latest world bank data for each country.

IBP

IBP forecasts were extracted with the following SQL query from the Redshift database. The following example gives IBP forecasts of India for the year 2023 as on January 1st 2023.

select
    mm.material_id as SKU,
    mm.<brand name> as brand,
    <forecasted period> as date,
    sum(<forecasted quantity>),
from
    <IBP sales forecast table> sf
    join <material table> mm on sf.material_id = mm.material_id -- to get brand name
where
    <validity date> <= '2023-01-01'
    and <forecasted period> between '2023-01-01' and '2023-12-31' -- Set desired forecast period
    and <forecasting customer country> in ('IN') -- Set desired countries
group by
    1, 
    2, 
    3

The dates would be changed and the query would be executed every month to get the latest forecasts on the 1st of every month. The process was automated in the pipeline.

The material ids of the IBP dataset would be matched with the SKUs of delivery data to filter only CP products.

Getting the data

Delivery data​

Weather data​

Economic Indicators data​

IBP​

Delivery data

Weather data

Economic Indicators data

IBP